With this project, we aim to present a simple yet cohesive and concluding approach to one of the most relevant application fields in Data Science: smart city planning. For this purpose, we targeted a relatively small and simple dataset that contains all traffic violations since 2012 in Montgomery County, Maryland. This dataset, though simple, provides very accurate and descriptive information on the nature of the traffic violations. In particular, we will focus on traffic violations with a specific nature: alcohol consumption-driven traffic violations. We chose this particular subset in order to provide sound and conclusive insight on how to potentially reduce those traffic violations, which are responsible for a significant amount of deaths and injuries.
This project is divided in three sections: in the first one, we will provide some preliminary information in order to describe the dataset and explore how traffic violations are distributed considering different dimensions. Subsequently, we will proceed to overlay traffic violations with bars serving alcohol, as a means to show potential explanations to the nature and number of these traffic violations. Similarly, we will also overlay metropolitan transportation stops in order to assess the relative proximity of these stops to the bars whose attendants seem to incur in a high number of traffic violations. Finally, we will provide conclusions and guidelines on possible means to optimise the public transportation stop layout and transportation frequency in order to possible reduce the number of traffic violations caused by alcohol consuption.
From the entire dataset, and as depicted in the plots below, we will only focus on alcohol-induced traffic violations, which only constitute \(3.6\%\) of the entire dataset. Even though this proportion might seem small, in subsequent sections we will show that the volume of data is adequate to provide insight on the current situation in the county of Montgomery.
This family of traffic violations actually accounted for 9 deaths and almost 900 injured people since 2012, as the second plot above shows. This decrease in the number of injured might suggest a reduction on the number of accidents. However, as the third plot shows, the number of traffic violations triggered by alcohol consumption has been steadily increasing over year, and the trend for 2016 seems to go in the same direction. Consequently, we consider that tackling ways to reduce this number is not only reasonable but also desired, as the number of injured people remains high.
In order to understand the nature of these traffic violations, we decided to analyse the time of occurrence of this violations, considering three different axes: day, time of the day and the combination of both axes (time of the day over each day, displayed as a trellis plot). All three plots are displayed below:
It can clearly be seen that late night and early morning hours present the highest proportion of traffic violations, which immediately suggest nightlife activity as the main root for these traffic violations. This is also supported by the following plot, which shows that most traffic violations occur during weekend days (that is, Friday, Saturday and Sunday).
Since this information is not enough to actually conclude that this global pattern is also local, that is, that there is no special day where traffic violations occur at night, we decided to display a Trellis plot that breaks the previous information on a day-by-day basis:
We can clearly see, then, that this behaviour pattern (traffic violations occuring during late night and early morning hours) is repeated throughout the entire week, almost the highest proportion can be found in weekend days. As a final step, it is important to be able to discern if the pattern occurs during the entire year. If so, then we can actually conclude that nightlife during weekends is indeed the main root of these violations.
In the plot above, even if we see a slight increase in traffic violations during the months of November and December, the number of violations per month do not differ significantly. Hence, we can conclude that applying measures during the weekends will take effect the entire year, which is a more than desirable characteristic for any measures we can suggest.
The second part of our story focuses on geographical aspects of the phenomenon we are exploring. At first, we wanted to take a look at the distribution of the traffic violations among the administrative territorial districts that are called police districts in the dataset we are working with. The are 7 main districts in Montgomery county and, obviously, they demonstrate different frequency of violations ?????both overall and alcohol-related ones.
The direct comparison displayed by the chart above demonstrates two main take-aways that we are going to use later on:
Some districts indeed have higher frequency of traffic violations, and alcohol-related violations follow the same distribution across districs in general. This might be an indicator of their relative size, meaning population and traffic in these districts, but it also may point out certain public places that affect this distribution, as well as their respective locations within the districts.
Silver Spring is the only district that had smaller proportion of alcohol-related violations than the overall pool of observed police records. All other districts show relatively similar ratios, and alcohol violations are seemingly redistributed from Silver Spring to all other districts. This fact leads us to two more hypotheses we would like to check: firstly, whether Silver Spring has indeed fewer public places, eventually “losing” alcohol violations to neighboring districts. Secondly, whether this district has higher traffic density, which causes more general violations, unrelated to driving drunk in general.
We dive deeper into exploring the geographical structure of alcohol violations distribution by plotting them on a map:
The very first thing that we directly observe from this map is the fact that violations tend to cluster around certain points. What is more, each district has their own centers of gravity, which we will try to discover further on.
As a possible preliminary explanation, this clustering of traffic violations could be related to the locations of bars, pubs and other drinking houses in the area. As a sidenote, we obtained these locations by scraping the public Yelp API. Especifically, we queries for bars and restaurants in Montgomery county, Maryland, with alcohol as a keyword. Additionally, subway station locations have been added to the map in order to better understand commuting patterns in the area.
Black diamonds represent public places, where the size of each diamond stands for the rating of this place on Yelp ??? the proxy variable for the popularity of the place that we decided to use in our analysis.
In general, the map suggests that those violation clusters indeed correlate with certain popular public places and transportation stations. A closer look at different areas provides additional insight: